Hierarchical Dialogue Policy Learning using Flexible State Transitions and Linear Function Approximation
نویسندگان
چکیده
Conversational agents that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem can be addressed either by using function approximation techniques that estimate an approximate true value function, or by using a hierarchical decomposition of a learning task into subtasks. In this paper, we present a novel approach for dialogue policy optimization that combines the benefits of hierarchical control with function approximation. The approach incorporates two concepts to allow flexible switching between subdialogues, extending current hierarchical reinforcement learning methods. First, hierarchical treebased state representations initially represent a compact portion of the possible state space and are then dynamically extended in real time. Second, we allow state transitions across sub-dialogues to allow non-strict hierarchical control. Our approach is integrated, and tested with real users, in a robot dialogue system that learns to play Quiz games.
منابع مشابه
A Non-Strict Hierarchical Reinforcement Learning for Interactive Systems and Robots
Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel appro...
متن کاملHybrid Reinforcement/Supervised Learning for Dialogue Policies from COMMUNICATOR data
We propose a method for learning dialogue management policies from a fixed dataset. The method is designed for use with “Information State Update” (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a very large policy space. To address the problem that any fixed dataset will only provide information about s...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملHybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets
We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a huge policy space. To address the problem that any fixed data set will only provide information abou...
متن کامل